Regressional Plane
Even though we have three variables in our model, we can create a three dimensional regresssional plane by keep Underlying Health Conditions constant within our model, thus we can create a three dimensional regressional plane of Population Density and Covid Case Rate and project it against the number of COVID-19 deaths. The graph contains not only the regressional plane but a 3D scatterplot of the original data (Note: this is per state):
popMean = mean(orig$`Population Density`)
popSd = sd(orig$`Population Density`)
caseMean = mean(orig$`Covid Case Rate`)
caseSd = sd(orig$`Covid Case Rate`)
myPred = function(model, underlyingConstant, x) {
pred = matrix(0, ncol=1, nrow=nrow(x))
for(i in 1:nrow(x)) {
a = cbind(underlyingConstant, x[i,])
colnames(a) = c("Underlying Health Conditions", "Covid Case Rate", "Population Density")
pred[i,1]=predict(model, a)
}
exp(pred)
}
ind = as.data.frame((matrix(c(0, 1, 2, 3), ncol=4)*popSd)+popMean)
ind2 = as.data.frame((matrix(c(-2, 1, 0, 1, 2, 3), nrow=1)*caseSd)+caseMean)
colnames(ind) = c("0", "1", "2", "3")
rownames(ind) = c("Pop. D.")
colnames(ind2) = c("-2", "-1", "0", "1", "2", "3")
rownames(ind2) = c("C. R.")
20% Quantile
Here we plot Covid Case Rate and Population Density with a constant Underlying Health Condition equal to its \(20\)% quantile against Covid Death Rate. Note that Covid Death Rate is in its original metrics but Covid Case Rate and Population Density are in their z-scores.
xVal = dataFA$`Covid Case Rate`
yVal = dataFA$`Population Density`
xPred = seq(min(xVal), max(xVal))
yPred = seq(min(yVal), max(yVal))
zPred = expand.grid(`Covid Case Rate` = xPred, `Population Density`=yPred, KEEP.OUT.ATTRS = FALSE)
zPred$`Covid Death Rate` = (myPred(modelFa, quantile(dataFA$`Underlying Health Conditions`, 0.2), zPred))
zPred = acast(zPred, `Covid Case Rate`~`Population Density`, value.var="Covid Death Rate")
myPlot = plot_ly(dataFA, x=~`Covid Case Rate`, y=~`Population Density`, z=~`Covid Death Rate`, type="scatter3d")
myPlot = add_trace(p=myPlot, z = zPred, x=xPred, y=yPred, type = "surface")
myPlot
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
For reference in the plot, here are the zscore values (column names) and their corresponding original variable values:
Population Density (People per square mile):
ind
## 0 1 2 3
## Pop. D. 203.8526 468.4812 733.1099 997.7386
Case Rate (Number of cases per person)
ind2
## -2 -1 0 1 2 3
## C. R. 0.003618468 0.03964285 0.02763473 0.03964285 0.05165098 0.06365911
50% Quantile
Here we plot Covid Case Rate and Population Density with a constant Underlying Health Condition equal to its \(50\)% quantile against Covid Death Rate. Note that Covid Death Rate is in its original metrics but Covid Case Rate and Population Density are in their z-scores.
xVal = dataFA$`Covid Case Rate`
yVal = dataFA$`Population Density`
xPred = seq(min(xVal), max(xVal))
yPred = seq(min(yVal), max(yVal))
zPred = expand.grid(`Covid Case Rate` = xPred, `Population Density`=yPred, KEEP.OUT.ATTRS = FALSE)
zPred$`Covid Death Rate` = (myPred(modelFa, quantile(dataFA$`Underlying Health Conditions`, 0.5), zPred))
zPred = acast(zPred, `Covid Case Rate`~`Population Density`, value.var="Covid Death Rate")
myPlot = plot_ly(dataFA, x=~`Covid Case Rate`, y=~`Population Density`, z=~`Covid Death Rate`, type="scatter3d")
myPlot = add_trace(p=myPlot, z = zPred, x=xPred, y=yPred, type = "surface")
myPlot
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
For reference in the plot, here are the zscore values (column names) and their corresponding original variable values:
Population Density (People per square mile):
ind
## 0 1 2 3
## Pop. D. 203.8526 468.4812 733.1099 997.7386
Case Rate (Number of cases per person)
ind2
## -2 -1 0 1 2 3
## C. R. 0.003618468 0.03964285 0.02763473 0.03964285 0.05165098 0.06365911
80% Quantile
Here we plot Covid Case Rate and Population Density with a constant Underlying Health Condition equal to its \(80\)% quantile against Covid Death Rate. Note that Covid Death Rate is in its original metrics but Covid Case Rate and Population Density are in their z-scores.
xVal = dataFA$`Covid Case Rate`
yVal = dataFA$`Population Density`
xPred = seq(min(xVal), max(xVal))
yPred = seq(min(yVal), max(yVal))
zPred = expand.grid(`Covid Case Rate` = xPred, `Population Density`=yPred, KEEP.OUT.ATTRS = FALSE)
zPred$`Covid Death Rate` = (myPred(modelFa, quantile(dataFA$`Underlying Health Conditions`, 0.8), zPred))
zPred = acast(zPred, `Covid Case Rate`~`Population Density`, value.var="Covid Death Rate")
myPlot = plot_ly(dataFA, x=~`Covid Case Rate`, y=~`Population Density`, z=~`Covid Death Rate`, type="scatter3d")
myPlot = add_trace(p=myPlot, z = zPred, x=xPred, y=yPred, type = "surface")
myPlot
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
For reference in the plot, here are the zscore values (column names) and their corresponding original variable values:
Population Density (People per square mile):
ind
## 0 1 2 3
## Pop. D. 203.8526 468.4812 733.1099 997.7386
Case Rate (Number of cases per person)
ind2
## -2 -1 0 1 2 3
## C. R. 0.003618468 0.03964285 0.02763473 0.03964285 0.05165098 0.06365911
Confidence Intervals
Here we will construt \(95\)% confidence intervals for each of the \(\beta\) coefficients.
confint(modelFa)
## 2.5 % 97.5 %
## (Intercept) -7.58560075 -7.264241203
## `Underlying Health Conditions` -0.06741585 0.206921155
## I(`Underlying Health Conditions`^2) -0.13180027 -0.008057839
## `Covid Case Rate` 0.26101544 0.517177222
## I(`Covid Case Rate`^2) -0.17101328 -0.005794841
## `Population Density` 0.35466854 0.868085641
## I(`Population Density`^2) -0.19632229 -0.002947855
We can see here that our 95% confidence interval that the underlying mean value of the intercept is negative. From Covid Case Rate, we can say that a single increase in Covid Case Rate standard deviation per state will increase COVID-19 death rates by some value in between \((0.261,0.5171)\) subracted by the squared unit value in between \((-0.171, -0.005)\). For Population Density we can say that a single increase in Population Density standard deviation per state will increase COVID-19 death rates by some value in between \((0.3455,0.8688)\) subracted by the squared unit value in between \((-0.1963, -0.0029)\). It was stated earlier that the p-value for our T test statistic for the single order term in Underlying Health Conditions was non-signficiant, and we can see that here when the 95% confidence interval contains zero. Here we have two possible interpretations. If the true underlying mean for Underlying Health Conditions is shown to be positive, then it is an influential predictor in our model like the rest of the single order terms, which will eventually be taken over by their negative second order terms as the unit sizes grow larger; however, if the true mean was shown to be zero then a totally different intrepretation will be derived. If the mean of Underlying Health Conditions was shown to be zero, then the only term from the variable would be the negative quadratic. Therefore, as Underlying Health Conditions increase, COVID-19 death rates would be shown to decrease instead of increase if the single order term was influential. The interpretation behind this may be that as states with higher morbidity rates in these Underlying Health Conditions would have lower COVID-19 morbidity rates because the persons who would have died from COVID-19 have already died from these Underlying Health Conditions.
Now let us check for the difference in \(\beta\) coefficients:
x = model.matrix(~`Underlying Health Conditions` + I(`Underlying Health Conditions`^2) + `Covid Case Rate` +
I(`Covid Case Rate`^2) + `Population Density` + I(`Population Density`^2), data=dataFA)
y = data$`Covid Death Rate`
c = solve(t(x)%*%x)
beta = c%*%t(x)%*%y
t = qt(1-0.05/2, nrow(data)-(3))
s = summary(modelFa)$sigma
Difference between Underlying Health Conditions and Covid Case Rate \((\beta_1+\beta_2)-(\beta_3+\beta_4)\):
a = c(0, 1, 1, -1, -1, 0, 0)
l = a%*%beta
temp = t*s*sqrt(t(a)%*%c%*%a)
m=matrix(c(l-temp, l+temp), ncol=2)
rownames(m) = c("b1/2-b3/4")
colnames(m) = c("Lwr", "Uppr")
as.data.frame(m)
## Lwr Uppr
## b1/2-b3/4 -0.5035708 -0.0765703
Here we can see that our 95% confidence interval is negative, implying with confidence that the mean difference between Underlying Health Conditions contributes less than Covid Case Rate to our model to some value in between \((-0.5, -0.07)\).
Difference between Underlying Health Conditions and Population Density \((\beta_1+\beta_2)-(\beta_5+\beta_6)\):
a = c(0, 1, 1, 0, 0, -1, -1)
l = a%*%beta
temp = t*s*sqrt(t(a)%*%c%*%a)
m=matrix(c(l-temp, l+temp), ncol=2)
rownames(m) = c("b1/2-b5/6")
colnames(m) = c("Lwr", "Uppr")
as.data.frame(m)
## Lwr Uppr
## b1/2-b5/6 -0.9716656 -0.4964539
Here we can see that our 95% confidence interval is negative, implying with confidence that the mean difference between Underlying Health Conditions contributes less than Population Density to our model to some value in between \((-0.97, -0.496)\).
Difference between Covid Case Rate and Population Density \((\beta_3+\beta_4)-(\beta_5+\beta_6)\):
a = c(0, 0, 0, 1, 1, -1, -1)
l = a%*%beta
temp = t*s*sqrt(t(a)%*%c%*%a)
m=matrix(c(l-temp, l+temp), ncol=2)
rownames(m) = c("b3/4-b5/6")
colnames(m) = c("Lwr", "Uppr")
as.data.frame(m)
## Lwr Uppr
## b3/4-b5/6 -0.6275606 -0.2604179
Here we can see that our 95% confidence interval is negative, implying with confidence that the mean difference between Covid Case Rate contributes less than Population Density to our model to some value in between \((-0.627, -0.26)\).
In summary, Population Density is the most influential factor in our model, followed by Covid Case Rate, and trailed by Underlying Health Conditions.
Point Estimates
Here we will construct \(95\)% confidence intervals for COVID-19 Death Rates estimated from our model’s regression along with the observed values and their difference with the point estimates.
ci=exp(predict(modelFa, interval="confidence"))
ci=cbind(orig$`Covid Death Rate`, ci[,1],orig$`Covid Death Rate`-ci[,1], ci[,2], ci[,3])
colnames(ci) = c("Actual", "Fitted", "Difference", "Lower", "Upper")
paged_table(as.data.frame(ci))
Underlying Health Conditions
Underlying Health Conditions contributes the least to our model but COVID-19 death rates will increase per state for every unit increase in this factor loading with 95% confidence by some value \((-0.067, 0.2069)\) subtracted by the square of \((-0.1318, -0.008)\). As stated before, one unit increase in Underlying Health Conditions is heavily correlated to a unit increase in Diabetes, Kidney Disease, Chronic Lower Respiratory Disease, Heart Disease, and Cancer mortality rates:
fa$loadings
##
## Loadings:
## PA1
## Diabetes Rate 0.856
## Hypertension Rate 0.557
## Kidney Disease Rate 0.974
## Chronic Lower Respiratory Disease Rate 0.837
## Heart Disease Rate 0.901
## Cancer Rate 0.974
##
## PA1
## SS loadings 4.451
## Proportion Var 0.742
Where each variable is incresed by the weights:
fa$weights
## PA1
## Diabetes Rate 0.17491006
## Hypertension Rate 0.02108163
## Kidney Disease Rate 0.33223086
## Chronic Lower Respiratory Disease Rate 0.08812859
## Heart Disease Rate 0.10055472
## Cancer Rate 0.33222736
As one can see, as Kidney Disease and Cancer Rates seem to be the most influential variables to the regression. Thus, as these variables increase by one unit, we expect that Underlying Health Conditions to increase by these values.